Inner Join
This lesson demonstrates how to perform an inner join.
We'll cover the following
Inner Join#
In the previous lesson, we saw how to join a table with itself. In this lesson we’ll join two different tables. We’ll introduce another table called DigitalAssets that’ll contain the online public properties such as Twitter, Facebook, and Pinterest belonging to a celebrity. The table structure is shown below:
Column Name | Column Type |
---|---|
URL | VARCHAR(200) |
AssetType | Enum(‘Facebook’,‘Twitter’,‘Instagram’,‘Pinterest’,‘Website’) |
LastUpdatedOn | TIMESTAMP |
ActorId | INT |
Note that the primary key of the table is the URL column as every URL is guaranteed to be unique. The DigitalAssets table is linked with the Actors table with the common column of ID for the actor as shown below. However, note that the column names in the two tables are different.
Syntax#
SELECT *
FROM table1
INNER JOIN table2
ON <join condition>;
Connect to the terminal below by clicking in the widget. Once connected, the command line prompt will show up. Enter or copy and paste the command ./DataJek/Lessons/26lesson.sh and wait for the MySQL prompt to start-up.
-
Using the INNER JOIN, we are now able to answer queries such as listing the Facebook pages for each celebrity. Note that each table in isolation can’t answer this query as the Actors table doesn’t hold the digital assets information for each actor and the DigitalAssets table doesn’t hold the names for each actor.
SELECT FirstName, SecondName, AssetType, URL
FROM Actors
INNER JOIN DigitalAssets
ON Actors.Id = DigitalAssets.ActorID;
-
If the two tables had the same column name for the actor’s ID then we could have used the alternative syntax with USING clause to make the query slightly less verbose as shown below:
SELECT FirstName, SecondName, AssetType, URL
FROM Actors
INNER JOIN DigitalAssets
USING(Id);Note that the columns listed in the SELECT clause are unique across the two tables. However, if the two tables had columns with the same names then we would need to disambiguate the two by fully qualifying the column with the table name.
Also notice that celebrities with no digital assets, or assets with no corresponding celebrity entries, in the Actors table aren’t captured with the results of the query. The server picks rows from both tables that have the same value for the two columns. Or you can think of it as an intersection of the two tables based on the IDs of the celebrities.
-
It’s not necessary to use the INNER JOIN clause to get an inner join between two tables. We can also use the WHERE clause to achieve the same effect as shown below:
SELECT FirstName, SecondName, AssetType, URL
FROM Actors, DigitalAssets
WHERE ActorId=Id;
There’s no difference in using the WHERE clause or the INNER JOIN clause in query performance, rather it is just a matter of taste.
-
We can also create a cartesian product between the two tables as we did in the self join section. We can use either the where or the inner join syntax. Both are shown below:
SELECT FirstName, SecondName, AssetType, URL
FROM Actors, DigitalAssets;Or,
SELECT FirstName, SecondName, AssetType, URL
FROM Actors
INNER JOIN DigitalAssets;
-
We can join any two columns from two tables that have the same type, or which can be converted to one another albeit with data loss. For instance, the following two queries are nonsensical, but the tables can still be joined on the columns that appear in the queries.
-- Makes no sense to join tables on FirstName and URL columns as they aren't related.
SELECT *
FROM Actors
INNER JOIN DigitalAssets ON URL = FirstName;
Or
-- Again no sense in combining net worth and actor id. Additionally, one is an int and the other a decimal but still comparable.
SELECT *
FROM Actors
INNER JOIN DigitalAssets
ON NetWorthInMillions = ActorId;
Both the queries result in empty sets.